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20 TECHNICAL FIELD 

The present invention relates in general to graphics rendering and more 
particularly to a software-implemented graphics rendering system and method 
designed and optimized for embedded devices (such as mobile computing 
devices) using fixed-point operations including a variable-length fixed point 
25 representation for numbers and a normalized homogenous coordinates system 
for vector operations. 



BACKGROUND OF THE INVENTION 

The importance of three-dimensional (3D) enabled embedded platforms 
30 has become increasingly important due to users' expectations of multimedia-rich 
environments in products ranging from DVD players, set-top boxes, Web pads 
and mobile computing device (including handheld computing devices) to 
navigational equipment and medical instrumentation. The importance of 3D 
rendering is manifested in its ability to provide users with greater and more 



MSFT Matter No. 304844.01 



Attorney Docket No. MCS-041-03 



detailed visual information. As users continue to expect equal or nearly equal 
graphics quality on embedded devices as on their desktop systems, applications 
designed to run on embedded platforms continue to converge with their desktop 
equivalents. Thus, the need for 3D graphics rendering is vital in today's 
embedded systems. 

One of the more popular 3D rendering standards available today is 
Direct3D by Microsoft® Corporation. Direct 3D is an application programming 
interface (API) for manipulating and displaying 3D objects. Direct3D provide 
programmers and developers with a way to develop 3D applications that can 
utilize whatever graphics acceleration hardware is installed on the system. 
Direct3D does an excellent job in supporting efficient rendering in desktop 
applications. These desktop systems typically have powerful central processing 
units (CPUs), math coprocessors, and graphics processing units (GPUs). 

Typical graphic rendering standards (such as Direct3D) are implemented 
using floating-point operations (such as transform and lighting). In embedded 
systems, the CPUs may not be powerful enough to support floating-point 
operations and they typically have no coprocessors or GPUs for accelerating the 
floating-point operations. Moreover, the graphics technology in these embedded 
platforms generally do not enable a number of key 3D graphics technologies 
(such as a vertex shader, a pixel shader, and vertex blending) that are required 
in applications designed for desktop systems. Thus, moving these rendering 
standards that work well on desktop systems directly to embedded platforms is 
not feasible because of the lack of powerful hardware and processing power on 
embedded systems. 

One technique used to overcome the hardware problem in embedded 
systems is to integrate the graphics rendering into software. However, floating- 
point software routines are notoriously slow. Moreover, floating-point operations 
are expensive and require large amounts of memory and have a large code size. 
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Thus, using floating-point operations in software-implemented graphics rendering 
is in impractical on an embedded platform. Therefore, there exists a need for a 
graphics rendering system that is optimized for operation on an embedded 
platform. Moreover, there is a need for a graphics rendering system that is 
5 software-implemented such that powerful hardware and processing power is not 
required. There is also so need for a software-implemented graphics rendering 
system to be fast, efficient require less memory and have a small code size such 
that the graphics rendering system is ideal for embedded platforms. 

10 SUMMARY OF THE INVENTION 

The invention disclosed herein includes a graphics rendering system and 
method that is optimized for use on embedded platforms (such as mobile 
computing devices). The graphics rendering system and method are software 
implemented and do not require powerful graphics and processing hardware. 

1 5 Moreover, the graphics rendering system and method use fixed-point operations 
instead of floating-point operations for renderings. Using fixed-point operations is 
much faster and more efficient than floating-point operations. In addition, fixed- 
point operations may be performed efficiently on less powerful processors that 
support only integer mathematics. This means that the graphics rendering 

20 system and method is optimized for embedded platforms and is faster, more 
efficient, requires less memory and has a smaller code size than graphics 
rendering system for desktop systems. 

The graphics rendering system and method includes a fixed-point 
25 mathematics library and graphics functions that enable efficient graphics 
rendering in embedded devices. The fixed-point mathematics library and 
graphics functions are generated considering the efficiency, resolution, CPU and 
memory of the embedded device. The fixed-point mathematics library includes 
optimized basic functions such as addition, subtraction, multiplication, division, all 
30 vertex operations, matrix operations, transform functions and lighting functions, 
and graphics functions. The data structure definition, mathematical operations, 



3 of 63 



MSFT Matter No. 304844.01 



Attorney Docket No. MCS-041-03 



and graphics functions are optimized for embedded platforms. The mathematical 
library and graphics functions are modified and optimized by using a variable- 
length fixed-point representation and a normalized homogenous coordinate 
system (NHCS) for vector operations. Using NHCS solves the fixed-point 
overflow problem. The graphics rendering system and method achieves a higher 
efficiency using software rendering and fixed-point NHCS representation without 
graphics hardware than traditional floating-point rendering with powerful graphics 
hardware. 

The NHCS graphics rendering method disclosed herein includes inputting 
rendering data in a floating-point format, fixed-point format, or both. The 
rendering data then is converted into a variable-length fixed-point format having a 
normalized homogenous coordinate system (NHCS). This converts the input 
rendering data into a NHCS fixed-point format. The NHCS fixed-point format 
allows computations and operations to be performed on the converted rendering 
data such that a range can be predicted. Any data outside of the range is 
truncated. This processing of the data in the NHCS fixed-point format allows 
more efficient use of valuable memory and processing power. A NCHS fixed- 
point data structure then is defined to characterize the converted rendering data 
and fixed-point math library is used to process the rendering data in the NHCS 
fixed-point data structure. The math library includes mathematical operations and 
graphics functions. The processed rendering data then is ready for rendering by 
a rendering engine. 

Conversion of the input rendering data into a NHCS fixed-point format is 
performed as follows. First, values are input and a maximum value is determined 
from among all of the input data. Next, a maximum fixed-point buffer size for a 
destination buffer is determined. Next, the maximum value is scaled to the 
maximum fixed-point buffer size and the number of digits that the value is shifted 
is recorded. Using this shift digit, the remainder of the values is normalized and 
the output is the input rendering data in a NHCS fixed-point format. 
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The NHCS graphics rendering system disclosed herein includes a task 
module, an application program interface (API) module, and a driver module. 
The task module inputs raw rendering data and converts the data into a desired 
fixed-point format. In some embodiments, the task module is capable of 
5 converting the input rendering data into either a traditional fixed-point format or a 
preferred NHCS fixed-point format. The API module creates buffers for storing 
the converted data. In addition, the API module prepares a command buffer for 
the driver module. The driver module contains mathematical operations and 
graphics functions to prepare the data for rendering. The data is in a fixed-point 
1 0 format (preferably a NHCS fixed-point format) and the mathematical operation 
and graphics functions are specially created to process the fixed-point data. The 
output is the processed rendering data that is ready to be rendered by a 
rendering engine. 

1 5 The task module includes a math library and translator that converts input 

rendering data and performs preliminary mathematical operations on the 
converted data. In addition, the math library and translator defines a specific 
data structure for the converted data. The API module includes an index buffer 
for storing indices and a vertex buffer for storing vertex information. The API 

20 module also includes a wrapper that packages commands and provides 

convenience, compatibility and security for the commands. This ensures that the 
commands are ready for the driver module. A command buffer residing on the 
API module stores the wrapper prior to the commands being sent to the driver 
module. 

25 

The driver module prepares data for the raster translating the data into the 
language of the computing device's graphics hardware. The driver module 
includes a transform and lighting (T&L) module and a rasterizer. The T&L 
module includes all necessary mathematical operations and graphic functions in 
30 a NHCS fixed-point data format for processing the converted rendering data. 
The rasterizer prepares the processed rendering data to be sent to the raster. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

The present invention can be further understood by reference to the 
following description and attached drawings that illustrate aspects of the 
5 invention. Other features and advantages will be apparent from the following 
detailed description of the invention, taken in conjunction with the accompanying 
drawings, which illustrate, by way of example, the principles of the present 
invention. 

10 Referring now to the drawings in which like reference numbers represent 

corresponding parts throughout: 

FIG. 1 is a block diagram illustrating a general overview of the normalized 
homogenous coordinate system (NHCS) graphics rendering system disclosed 
15 herein. 

FIG. 2 illustrates an example of a suitable computing system environment 
in which the NHCS graphics rendering system and method may be implemented. 

FIG. 3 is a block diagram illustrating the details of an exemplary 
implementation of the NHCS graphics rendering system shown in FIG. 1. 
20 FIG. 4 is a general flow diagram illustrating the operation of the NHCS 

graphics rendering method of the NHCS graphics rendering system shown in 
FIG. 1. 

FIG. 5 is a detailed flow diagram illustrating the operation of the 
conversion process of the task module shown in FIGS. 1 and 3. 
25 FIG. 6 is a working example of the conversion process shown in FIG. 5. 

FIG. 7 is a detailed flow diagram illustrating the operation of the API 
module shown in FIGS. 1 and 3. 

FIG. 8 is a detailed flow diagram illustrating the operation of the driver 
module shown in FIGS. 1 and 3. 
30 FIG. 9 illustrates an exemplary implementation of a buffer to store culling 

planes. 
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FIGS. 10A and B illustrate an exemplary implementation of normalized 
vectors in a Direct3D mobile (D3DM) Phong Model. 

DETAILED DESCRIPTION OF THE INVENTION 

5 In the following description of the invention, reference is made to the 

accompanying drawings, which form a part thereof, and in which is shown by 
way of illustration a specific example whereby the invention may be practiced. It 
is to be understood that other embodiments may be utilized and structural 
changes may be made without departing from the scope of the present invention. 

10 

I. General Overview 

Embedded platforms (such as mobile computing devices) often have 
hardware that does not support intensive graphics rendering. In particular, a 
mobile computing device may have a central processing unit (CPU) with limited 
1 5 processing power and lack a coprocessor or graphics processing unit (GPU). 
This type of hardware that is found on most mobile computing devices typically 
will not support floating-point operations that are commonly used in graphics 
rendering. This severely limits the usefulness and desirability of mobile 
computing devices. 

20 

The NHCS graphics rendering system and method disclosed herein is 
implemented in software and uses a fixed-point representation of numbers 
instead of traditional floating-point representation. Using a fixed-point 
representation is a much faster way to perform mathematical operations and can 

25 easily be optimized for use on a mobile computing device. The NHCS graphics 
rendering system and method disclosed herein includes an optimized fixed-point 
math library that enables efficient and fast graphics rendering in embedded 
devices. The math library includes fixed-point mathematical operations and 
graphics functions. The data structure, mathematical operations, and graphics 

30 functions are optimized for embedded platform by using a variable-length fixed- 
point representation and a normalized homogenous coordinates system (NHCS) 
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for vector operations. The NHCS graphics rendering system and method is 
software-based and can easily be implemented into existing mobile computing 
devices without hardware modification. 

5 FIG. 1 is a block diagram illustrating a general overview of the NHCS 

graphics rendering system 100 disclosed herein. The system 100 typically 
resides on a computing device 110, such as a mobile computing device. In 
general, the system 100 inputs raw rendering data 120, processes the data 120 
and outputs processed rendering data 130 suitable for rendering by a rendering 
10 engine (not shown). The raw rendering data 120 typically is in a floating-point 
format. 

As shown in FIG. 1, the NHCS graphics rendering system 100 includes a 
task module 140, an application program interface (API) module 150, and a 

15 driver module 160. The task module 140 inputs the raw rendering data 120 in a 
floating-point format and converts the data 120 into a desired fixed-point format. 
In some embodiments, the task module 140 is capable of converting the data 
120 in a floating-point format into either a traditional fixed-point format or a 
preferred NHCS fixed-point format. The converted data then is sent to the API 

20 module 1 50. The API module 1 50 creates buffers for storing the converted data. 
In addition, the API module 1 50 prepares a command buffer for the driver module 
160. The driver module 160 contains the mathematical operation and graphics 
functions to prepare the data for rendering. The data is in a fixed-point format 
(preferably a NHCS fixed-point format) and the mathematical operation and 

25 graphics functions are specially created to process the fixed-point data. The 
output is the processed rendering data 130 that is ready to be rendered by a 
rendering engine. 

II. Exemplary Operating Environment 

30 The NHCS graphics rendering system and method disclosed herein is 

designed to operate in a computing environment. The following discussion is 
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intended to provide a brief, general description of a suitable computing 
environment in which the NHCS graphics rendering system and method may be 
implemented. 

5 FIG. 2 illustrates an example of a suitable computing system environment 

200 in which the NHCS graphics rendering system and method may be 
implemented. The computing system environment 200 is only one example of a 
suitable computing environment and is not intended to suggest any limitation as 
to the scope of use or functionality of the invention. Neither should the 
1 0 computing environment 200 be interpreted as having any dependency or 

requirement relating to any one or combination of components illustrated in the 
exemplary operating environment 200. 

The NHCS graphics rendering system and method is operational with 
1 5 numerous other general purpose or special purpose computing system 

environments or configurations. Examples of well known computing systems, 
environments, and/or configurations that may be suitable for use with the NHCS 
graphics rendering system and method include, but are not limited to, personal 
computers, server computers, hand-held, laptop or mobile computer or 
20 communications devices such as cell phones and PDA's, multiprocessor 
systems, microprocessor-based systems, set top boxes, programmable 
consumer electronics, network PCs, minicomputers, mainframe computers, 
distributed computing environments that include any of the above systems or 
devices, and the like. 

25 

The NHCS graphics rendering system and method may be described in 
the general context of computer-executable instructions, such as program 
modules, being executed by a computer. Generally, program modules include 
routines, programs, objects, components, data structures, etc., that perform 
30 particular tasks or implement particular abstract data types. The invention may 
also be practiced in distributed computing environments where tasks are 
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performed by remote processing devices that are linked through a 
communications network. In a distributed computing environment, program 
modules may be located in both local and remote computer storage media 
including memory storage devices. With reference to FIG. 2, an exemplary 
5 system for implementing the NHCS graphics rendering system and method 
includes a general-purpose computing device in the form of a computer 210 (the 
computer 210 is an example of the computing device 110 shown in FIG. 1). 

Components of the computer 210 may include, but are not limited to, a 
10 processing unit 220, a system memory 230, and a system bus 221 that couples 
various system components including the system memory to the processing unit 
220. The system bus 221 may be any of several types of bus structures 
including a memory bus or memory controller, a peripheral bus, and a local bus 
using any of a variety of bus architectures. By way of example, and not 
15 limitation, such architectures include Industry Standard Architecture (ISA) bus, 
Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video 
Electronics Standards Association (VESA) local bus, and Peripheral Component 
Interconnect (PCI) bus also known as Mezzanine bus. 

20 The computer 210 typically includes a variety of computer readable media. 

Computer readable media can be any available media that can be accessed by 
the computer 210 and includes both volatile and nonvolatile media, removable 
and non-removable media. By way of example, and not limitation, computer 
readable media may comprise computer storage media and communication 

25 media. Computer storage media includes volatile and nonvolatile removable and 
non-removable media implemented in any method or technology for storage of 
information such as computer readable instructions, data structures, program 
modules or other data. 

30 Computer storage media includes, but is not limited to, RAM, ROM, 

EEPROM, flash memory or other memory technology, CD-ROM, digital versatile 
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disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, 
magnetic disk storage or other magnetic storage devices, or any other medium 
which can be used to store the desired information and which can be accessed 
by the computer 210. Communication media typically embodies computer 
5 readable instructions, data structures, program modules or other data in a 

modulated data signal such as a carrier wave or other transport mechanism and 
includes any information delivery media. 

Note that the term "modulated data signal" means a signal that has one or 
1 0 more of its characteristics set or changed in such a manner as to encode 

information in the signal. By way of example, and not limitation, communication 
media includes wired media such as a wired network or direct-wired connection, 
and wireless media such as acoustic, RF, infrared and other wireless media. 
Combinations of any of the above should also be included within the scope of 
1 5 computer readable media. 

The system memory 230 includes computer storage media in the form of 
volatile and/or nonvolatile memory such as read only memory (ROM) 231 and 
random access memory (RAM) 232. A basic input/output system 233 (BIOS), 

20 containing the basic routines that help to transfer information between elements 
within the computer 210, such as during start-up, is typically stored in ROM 231. 
RAM 232 typically contains data and/or program modules that are immediately 
accessible to and/or presently being operated on by processing unit 220. By way 
of example, and not limitation, FIG. 2 illustrates operating system 234, 

25 application programs 235, other program modules 236, and program data 237. 

The computer 210 may also include other removable/non-removable, 
volatile/nonvolatile computer storage media. By way of example only, FIG. 2 
illustrates a hard disk drive 241 that reads from or writes to non-removable, 
30 nonvolatile magnetic media, a magnetic disk drive 251 that reads from or writes 
to a removable, nonvolatile magnetic disk 252, and an optical disk drive 255 that 
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reads from or writes to a removable, nonvolatile optical disk 256 such as a CD 
ROM or other optical media. 

Other removable/non-removable, volatile/nonvolatile computer storage 
5 media that can be used in the exemplary operating environment include, but are 
not limited to, magnetic tape cassettes, flash memory cards, digital versatile 
disks, digital video tape, solid state RAM, solid state ROM, and the like. The 
hard disk drive 241 is typically connected to the system bus 221 through a non- 
removable memory interface such as interface 240, and magnetic disk drive 251 
10 and optical disk drive 255 are typically connected to the system bus 221 by a 
removable memory interface, such as interface 250. 

The drives and their associated computer storage media discussed above 
and illustrated in FIG. 2, provide storage of computer readable instructions, data 

1 5 structures, program modules and other data for the computer 21 0. In FIG. 2, for 
example, hard disk drive 241 is illustrated as storing operating system 244, 
application programs 245, other program modules 246, and program data 247. 
Note that these components can either be the same as or different from 
operating system 234, application programs 235, other program modules 236, 

20 and program data 237. Operating system 244, application programs 245, other 
program modules 246, and program data 247 are given different numbers here to 
illustrate that, at a minimum, they are different copies. A user may enter 
commands and information into the computer 210 through input devices such as 
a keyboard 262 and pointing device 261 , commonly referred to as a mouse, 

25 trackball or touch pad. 

Other input devices (not shown) may include a microphone, joystick, game 
pad, satellite dish, scanner, radio receiver, or a television or broadcast video 
receiver, or the like. These and other input devices are often connected to the 
30 processing unit 220 through a user input interface 260 that is coupled to the 
system bus 221 , but may be connected by other interface and bus structures, 
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such as, for example, a parallel port, game port or a universal serial bus (USB). 
A monitor 291 or other type of display device is also connected to the system bus 
221 via an interface, such as a video interface 290. In addition to the monitor, 
computers may also include other peripheral output devices such as speakers 
5 297 and printer 296, which may be connected through an output peripheral 
interface 295. 

The computer 210 may operate in a networked environment using logical 
connections to one or more remote computers, such as a remote computer 280. 

1 0 The remote computer 280 may be a personal computer, a server, a router, a 

network PC, a peer device or other common network node, and typically includes 
many or all of the elements described above relative to the computer 210, 
although only a memory storage device 281 has been illustrated in FIG. 2. The 
logical connections depicted in FIG. 2 include a local area network (LAN) 271 

1 5 and a wide area network (WAN) 273, but may also include other networks. Such 
networking environments are commonplace in offices, enterprise-wide computer 
networks, intranets and the Internet. 

When used in a LAN networking environment, the computer 210 is 
20 connected to the LAN 271 through a network interface or adapter 270. When 
used in a WAN networking environment, the computer 210 typically includes a 
modem 272 or other means for establishing communications over the WAN 273, 
such as the Internet. The modem 272, which may be internal or external, may be 
connected to the system bus 221 via the user input interface 260, or other 
25 appropriate mechanism. In a networked environment, program modules 

depicted relative to the computer 210, or portions thereof, may be stored in the 
remote memory storage device. By way of example, and not limitation, FIG. 2 
illustrates remote application programs 285 as residing on memory device 281 . 
It will be appreciated that the network connections shown are exemplary and 
30 other means of establishing a communications link between the computers may 
be used. 
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III. System Compon nts 

FIG. 3 is a block diagram illustrating the details of an exemplary 
implementation of the NHCS graphics rendering system 100 shown in FIG. 1. In 
5 this exemplary implementation, the NHCS graphics rendering system 100 is 
implemented in a Direct3D mobile environment. Microsoft® Corporation in 
Redmond, Washington, developed Direct3D (D3D) and it has become a 
rendering standard. Traditionally, D3D supports efficient rendering in desktop 
personal computer (PC) applications. These PCs typically have powerful CPUs 

1 0 and GPUs and can support intensive graphics rendering. For other embedded 
devices, such as mobile computing devices, D3D dos not fit because it needs 
powerful processing units. The NHCS graphics rendering system and method 
disclosed herein enables the use of D3D on mobile computing devices (D3DM). 
The NHCS graphics rendering system and method includes powerful software- 

1 5 based fixed-point mathematical library and corresponding graphics functions. 
The mathematical library is optimized for use on mobile computing devices and 
makes efficient use of the limited resources available on mobile computing 
devices. The data structure definition, mathematical operations, and graphics 
functions are specially designed and optimized for D3DM by using variable- 

20 length fixed-point representation and NHCS for vector operations. 

The basic structure of D3DM is that there is a "thin" API module and a 
"thick" driver module. In the thin API module, the interface is simple and 
straightforward. Thus, the API code provides integration with the operating 
25 system and hosting for the display driver, but does not provide any actual 

drawing code. In the thick driver module, most of the work is forwarded by the 
API module and performed in the driver module. Thus, the thick driver module 
includes drawing code, which may only be overridden by the display driver. 

30 The design of D3DM is based on the fact that models can be described in 

terms of primitives. In turn, each primitive is described in terms of a plurality of 
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vertexes (or vertices). A vertex is the point at which two lines meet. The vertex 
carries a great of information. For example, the vertex contains the 3-D 
coordinates and weight. In addition, there is color information, often specified in 
the form of a diffuse and a specular color. This color data is commonly coded in 
5 the "RGBA" format (for red, green, blue and alpha). The vertex also contains a 
normal, the vector that is orthogonal to its surface, and the texture coordinates 
that represent the texture and its position for the vertex. The vertex may have 
several texture coordinates in case more than one texture is applied to the 
vertex. Further, the vertex may contain texture fog as well as other information 
10 such as point size. Thus, the vertex, the smallest unit in a 3-D scene, contains a 
large amount of information. 

D3DM loads this data for the vertices into vertex buffers. The data then is 
processed by a transform and lighting (T&L) pipeline where the output is pixel 

15 color values for a frame buffer. The NHCS graphics rendering system and 

method contains a mathematical library that is used by D3DM for two purposes. 
First, the mathematical library is used to translate floating-point data into a NHCS 
fixed-point format and perform some necessary mathematical operations . 
Second, the mathematical library is used to implement the transform and lighting. 

20 The D3DM drivers expose all of the features of the mobile computing device on 
which an application is running, thus achieving maximum drawing performance. 

Referring to FIG. 3, the task module 140 includes a math library and 
translator 300, an application 305, and floating-point data 310. In general, the 

25 task module 140 inputs the floating-point data 310 and converts the data 310 into 
a fixed-point format or a NHCS fixed-point format. The converted data then is 
sent to buffers created by the API module 150. The math library and translator 
300 converts the data 310 and performs preliminary mathematical operations on 
the converted data. In addition, the math library and translator 300 defines a 

30 specific data structure for the converted data. The preliminary mathematical 
operations and data structure definitions are discussed in detail below. 
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The API module 1 50 creates buffers for storing the converted data and 
preparing the data for the driver module 160. The API module 150 includes an 
index buffer 315, for storing indices, and a vertex buffer 320, for storing vertex 
5 information. The index buffer holds a value for each vertex. The value is called 
an index. Indices are used to retrieve a vertex in the vertex buffer. Each index is 
an offset in the current vertex buffer of the data for this vertex. This allows for the 
sharing of vertex data between multiple vertices and avoids the duplicated 
storage of vertices when two neighboring triangles share vertices. The API 

10 module 150 also includes commands 325 that provide instructions for the 

rendering and texture 330 that provides texture information. The API module 150 
includes a wrapper 335 that packages the commands 325 and provides 
convenience, compatibility and security for the commands 325. This ensures the 
that the commands 325 are ready for the driver module 160. A command buffer 

1 5 340 stores the wrapper 335 prior to them being sent to the driver module 1 60. 

The driver module 160 prepares data for the raster. In addition, the driver 
module 160 prepares the data for use by a rendering engine. This means that 
the data is translated into the language of the computing device's graphics 

20 hardware and causes particular primitives to be drawn. The driver module 1 60 
includes a transform and lighting (T&L) module 345 and a rasterizer 350. The 
T&L module 345 includes all necessary mathematical operations and graphic 
functions in the NHCS fixed-point data format. These mathematical operations 
and graphic functions are discussed in detail below. The rasterizer prepares the 

25 rendering data to be sent to the raster. 

IV. Components Details 

As stated above, the math library and translator 300 converts the data 310 
and performs preliminary mathematical operations on the converted data. In 
30 addition, the math library and translator 300 defines a specific data structure for 
the converted data. The basic mathematical operations of fixed-point data, the 
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NHCS fixed-point format, and the data structure definitions of the math library 
and translator 300 now will be addressed. 

Fixed Point Mathematical Operations 
5 The basic mathematical operations performed by the math library and 

translator module 300 include addition (+), subtraction (-), multiplication (*) and 
division (/). Each of these basic functions is optimized to achieve more efficiently 
in software rendering than can be had with traditional floating point rendering with 
graphics hardware. Each of these optimized basic mathematical operations will 
10 now be discussed. 

Addition 

Most central processing units (CPUs) designed for mobile computing devices 
support integer addition. For example, CPUs designed for use with D3DM 
1 5 support integer addition. When adding two fixed-point numbers having the same 
bits of mantissa, integer addition can be used. However, care is required to 
avoid the overflow problem. In addition, care must also be used when adding 
signed and unsigned fixed-point data. 

20 The basic algorithm of the addition of fixed-point numbers assumes that 

operand A and operand B are in both fixed-point data with m-bit mantissa. In this 
situation, 

C = A+B 

25 

will also be fixed-point data with m-bit mantissa. It should be noted that overflow 
is possible in fixed-point addition. 

Overflow in a signed integer is different from overflow in an unsigned 
30 integer. By way of example, given two unsigned 32-bit integer, 
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0x7FFFFFFF + 0x7FFFFFFF = OxFFFF FFFE 

there is no overflow. However, if the same data is added as signed 32-bit 
integer, overflow occurs in the sign bit. Most compilers can distinguish signed 
5 integer from a unsigned integer, so there typically is no need to address this 
situation. However, when writing in assembler language, code or programs 
(ASM), the difference between signed and unsigned must be taken into account. 

The addition of signed and unsigned data is only appropriate when the 
10 signed operand is positive. The result can be saved as a signed or unsigned 
number, as long as no overflow occurs. The addition of different bit integers 
requires alignment. For example, when adding a 32-bit integer with a 16-bit 
integer, the 16-bit integer must be aligned to the 32-bit integer. Given that 
mantissa bits in operands are the same, the addition will be correct. It should be 
15 noted that when coding in C++ the C++ compiler will automatically perform the 
alignments, but when coding using ASM the need for alignment must be 
recognized. 

Addition results larger than the maximum or less than the minimum will 
20 cause an overflow. The mathematical operations of the NHCS graphics 

rendering system 100 will not deal with overflow for performance consideration. 
In such cases, the operand can be pre-shifted before adding to avoid overflow. 
In a working example of the NHCS graphics rendering system 100, the following 
are the maximum and minimums: 
25 ♦ For 32-bits signed integer, the maximum is 0x7FFF FFFF, and the 

minimum is 0x8000 0000. 
♦ For 32-bits unsigned integer, the maximum is OxFFFF FFFF, and the 
minimum is 0x0000 0000. 

30 Subtraction 
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Integer subtraction is supported on mobile computing devices using 
D3DM. Integer subtraction can be used when subtracting two fixed-point data 
with the same bits of mantissa. Once again, care is required to avoid the 
overflow problem. Moreover, care must also be used when subtracting signed 
and unsigned fixed-point data. 

The basic algorithm of the subtraction of fixed-point numbers assumes 
that operand A and operand B are in both fixed-point data with m-bit mantissa. 
In this situation, 

C = A-B 

is also fixed-point data with m-bit mantissa. Overflow also is possible in 
subtraction. 

Subtraction results larger than the maximum or less than the minimum will 
cause an overflow. The NHCS graphic rendering system 100 does not deal with 
overflow for performance consideration. In such cases, the operand is pre- 
shifted before subtracting to avoid overflow. In a working example of the NHCS 
graphics rendering system 100, the following are the maximum and minimums: 

♦ For 32-bits signed integer, the maximum is 0x7FFF FFFF, and the 
minimum is 0x8000 0000. 

♦ For 32-bits unsigned integer, the maximum is OxFFFF FFFF, and the 
minimum is 0x0000 0000. 

Multiplication 

Integer multiplication also is supported on mobile computing devices using 
D3DM. When multiplying two fixed-point numbers, the intermediate result is 
stored in double buffer. Overflow may appear when the double buffer is 
truncated to a single buffer. 
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The basic algorithm of the multiplication of fixed-point numbers assumes 
that operand A is a /7-bit fixed point data with a-bit mantissa, and operand B is a 
n-bit fixed point data with to-bit mantissa. In this situation, 



C = A*B 



is a 2/7-bit fixed point data with fa+bj-bit mantissa. 



10 Overflow in multiplication happens C is truncated to a smaller storage. 

This may occur when it is desired to truncate to the same n-bits as with the 
operands. In this truncation, both overflow and underflow is possible. To avoid 
this overflow, the multiplication principle is followed that states all intermediate 
results should not be truncated. This can cause problems if three 32-bit 

1 5 operands are multiplied sequentially. At the first multiplication, a 64-bit 

intermediate result is obtained. Next, the 64-bit intermediate result is multiplied 
with the third 32-bit operand, which produces a 96-bit result. 

In the NHCS graphics rendering system and method, the overflow is 
20 handled as follows. First, after each multiplication the 64-bits intermediate result 
is truncated to 32-bits. This assumes that no overflow can occur in the 
truncation. A second solution is to use NHCS to all operands to reduce their bits, 
say, from 32-bits to 16-bits. Then the sequentially multiplication of three integer 
yields a 48-bits result. Of course, this will result in some lose of precision, but it 
25 is useful if the sign of the final result is needed. This need may occur, for 
example, in back face culling. 



Division 

Division is a common operation, but it is expensive in CPU time. 
30 Moreover, in some embedded CPUs division is not supported in hardware at all. 
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The NHCS graphic rendering system and method avoid this by implementing 
division with software. 

The basic algorithm of the division of fixed-point numbers assumes 
5 operand A is a n-bit fixed-point data with a-bit mantissa, and operand B is a n-bit 
fixed point data with b-bit mantissa. In this situation, 

C = A/B 

is a n-bit fixed-point data with (a-b)-bit mantissa. It can be seen that division will 
10 lose precision, so that the dividend must be shifted to increase its mantissa bits 
before division. If a result is needed with c-bits mantissa, the operand A must be 
pre-shifted with c-(a-b) bits. This pre-shift can cause operand A to overflow if A 
stores as n-bits. Generally, A is converted to 2n-bit integer before the pre-shift. 
For constant division, the value could be converted to its reciprocal, and then 
15 division becomes multiply. For non-constant division, the basic operation is 
reciprocal and a method such as Newton's iteration method can be used. 

Given a, the desire is to obtain 1/a. The target function is: 

20 /to- 1 -a 

! X_ 

The iteration is: 

x m ^Xi-fixjtfixt) = *!-(' ~ a )K — Y) = ^(2~a*.)i. 

25 Each iteration step involves 2 multiplications and 1 subtraction, and gives 

twice the precision. Given a 256-items array for the initial guess, we 32-bit 
precision can be obtained with two iterations. The division latency for 32-bit 
precision has 1 memory lookup, 5 multiplications, and 2 subtractions. 
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Overflow does not occur in division for non-preshifted integers. However, 
when dividing a 64-bit integer by a 32-bit integer and storing the result in 32-bit 
integer, overflow may occur. This may happen when the dividend is pre-shifted. 
Moreover, underflow can occur if an inappropriate mantissa bit of result is 
5 chosen. As with other forms of division, a zero divisor should be prevented. 

Fixed-Point Number Representation 

The NHCS graphics rendering system and method disclosed herein use a 
normalized homogenous coordinate system (NHCS) to represent numbers. 

10 NHCS is a high-resolution variation of fixed-point number representation. In 
general, fixed-point representation of numbers is a way to represent a floating- 
point number using integers. Briefly, representing a number in a floating-point 
representation means that the decimal does not remain in a fixed position. 
Instead, the decimal "floats" such that the decimal always appears immediately 

15 after the first digit. As discussed above, using a floating point representation on 
a mobile device may not be possible due to processor and other hardware 
limitations. 



The alternative is to use fixed-point number representation that is 
20 executed using integer functions. On mobile, wireless and other embedded 
platforms, the CPU may not be powerful enough to support floating-point 
operations and there typically are no coprocessors for accelerating the floating- 
point operations. Another important issue is that most floating point software 
routines are quite slow. Fixed-point is a much faster way to handle calculations. 

25 

Fixed-point number representation is a way to speed up any program that 
uses floating point . Typically, some of the bits are use for a whole part of the 
number and some bits are used for a fractional part. For example, if there are 32 
bits available, a 16.16 configuration means that there are 16 bits before the 
30 decimal (representing the whole part of the number) and 16 bits after the decimal 
(representing the fractional part of the number). In this case, the value 
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65535.999984741211 is the largest possible number for the 16.16 configuration. 
This is obtained by setting the decimal portion to all Vs (in binary). The value 
65535 with 16 bits is obtained for the whole part of the number. If 65535 is 
divided by 65536, then the value .99998474121 1 is obtained for the fractional 
5 part. There are other variants such as 24.8 (24 bits before the decimal and 8 bits 
after) and 8.24 (8 bits before the decimal and 24 bits after). The configuration 
type depends on the amount of precision that an application needs. 

In an exemplary embodiment of the optimized NHCS graphics rendering 
10 system and method, Direct3D for mobile devices (D3DM) is used. In order to 
uses numbers in the D3DM transform and lighting (T&L) module, floating point 
numbers need to be converted to NHCS fixed-point numbers. Preferably, the 
conversion is easy as possible (such as we need not to know the range of the 
input vertices) while preserving the precision of the data. NHCS fixed-point 
1 5 number representation achieves these objectives. 

NHCS is a type of vertex representation. NHCS can eliminate the 
annoying overflow, and provides a wider data space. For example, without 
NHCS, the model space vertex coordinates range from 2" 16 ~2 15 , assuming that a 
16-bit mantissa is used. On the other hand, if NHCS is used, the model space 
20 vertex coordinates range from 2' 31 ~2 31 . By adopting NHCS it can be seen that 
both range and precision are greatly increased. 

NHCS also makes the conversion from floating-point to fixed-point easy. 
It is not necessary to know the exact range of the input vertices. NHCS also 
eliminates the factitious overflow and takes advantage of the full storage of the 
25 buffer. Moreover, NHCS has the advantage of providing a wider data 

representation given the same precision. NHCS also preserves all transform and 
lighting (T&L) operations and makes use of the "w" in homogeneous coordinate 
representation. 

30 Data Structure for Transform & Lighting 
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The data structure definition for the NHCS fixed-point format is shown in 
the following tables: 





Basic tvoe 




c 
0 


♦ 5rlXo4: 


signed 64-bit integer 




♦ UFIX64 


: unsigned 64-bit integer 




♦ SFIX32: 


signed 32-bit integer 




♦ UFIX32 


: unsigned 32-bit integer 




♦ SFIX16: 


signed 16-bit integer 


10 


♦ UFIX16: 


; unsigned 16-bit integer 




♦ SFIX8: 


signed 8-bit integer 




♦ UFIX8: 


signed 8-bit integer 




Structure type 




15 


♦ typedef 


SFIX64 



SFIX64Quad[4] 

This data structure is used to store a 4-element vector, and each element 
is a 64-bit signed integer. This vector can be either NHCS or non-NHCS. 



♦ typedef SFIX64 SFIX64Triple[3] 

20 This data structure is used to store a 3-element vector, and each element 

is a 64-bit signed integer. This vector can be either NHCS or non-NHCS. 

♦ typedef SFIX32 SFIX32Quad[4] 

This data structure is used to store a 4-element vector, and each element 
25 is a 32-bit signed integer. This vector can be either NHCS or non-NHCS. 



30 



♦ typedef SFIX32 SFIX32Triple[3] 

This data structure is used to store a 3-element vector, and each element 
is a 32-bit signed integer. This vector can be either NHCS or non-NHCS. 
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♦ typedef SFIX16 SFIX16Quad[4] 

This data structure is used to store a 4-element vector, and each element 
is a 16-bit signed integer. This vector can be either NHCS or non-NHCS. 

♦ typedef SFIX16 SFIX16Triple[3] 

This data structure is used to store a 3-element vector, and each element 
is a 16-bit signed integer. This vector can be either NHCS or non-NHCS. 

♦ typedef UFIX8 UFIX8Quad[4] 

This data structure is used to store a 4-element vector, and each element 
is an 8-bit unsigned integer. This vector is non-NHCS. This vector is used 
mainly for representing color RGBA components. 

♦ typedef SFIX32Mat4x4 SFIX32[16]; 

This data structure is used to store a 16-element matrix, which is 4 by 4. 
Each element of the matrix is a 32-bit unsigned integer. This matrix can be either 
NHCS or non-NHCS. 

Default mantissa bits 

The default mantissa bits listed here are for fixed-point data 
representation: 

♦ #define DEFAULT_SFIX32 1 6 //default mantissa bits for 32-bit 
signed 

♦ #define ONE_SFIX32 30 //mantissa bits for 32-bit signed 
with (-1-1) 

♦ #def ine NORMAL_SFIX1 6 1 4 //normal mantissa bits for 1 6-bit 
signed 

♦ #define TEXTURE_SFIX16 12 //mantissa bits for 16-bit texture 
coordinate 
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♦ #defme ONEJJFIX16 
within (0-1) 

♦ #define COLOR UFIX16 



15 //mantissa bits for 16-bit unsigned 



8 //color mantissa bits for 16-bit 
unsigned 



10 



Constant 

The constants listed here are for integer shifting during computation and 
conversion between different data formats: 

♦ const SFIX32 SFIX32J = (SFIX32)1«DEFAULT_SFIX32; 

♦ const SFIX32 ONE_SFIX32J=(SFIX32)1«ONE_SFIX32; 

♦ const SFIX16 NORM AL_SFIX1 6_1 = (SFIX16)1«NORMAL_SFIX16; 

♦ const int POSTOTEX=ONE_SFIX32- TEXTURE_SFIX16; 

♦ const int NORMTOTEX= NORMAL_SFIX1 6 - TEXTURE_SFIX1 6; 



15 



The basic operations have the following data structure definition: 



Type convert 

The following macros are conversion macros for converting between 
different data formats: 



♦ #define PosToTex(a) 

♦ #define NormToTex(a) 

♦ #define FloatToSFIX32(a,n 

♦ #define SFIX32ToFloat(a,n 

♦ #define FloatToSFIX16(a,n 

♦ #define FloatToUFIX16(a,n 

♦ #defineSFIX16ToFloat(a,n 

♦ #define FloatToUFIX8(a) 



((SFIX1 6)((a)»POSTOTEX)) 
((SFIX1 6)((a)»NORMTOTEX)) 
((SFIX32)((a)*((SFIX32)1 «(n))) ) 
((float)(a)/((SFIX32)1 «(n))) 
((SFIX1 6)((a)*((SFIX1 6)1 «(n)))) 
((UFIX1 6)((a)*((UFIX1 6)1 «(n)))) 
((float)(a)/((SFIX16)1«(n))) 
((UFIX8)((a)*255)) 
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Operations 

The following macros are computation macros for computing between 
fixed-point data: 



♦ #define Mul_SFIX32(a,b,n) 

♦ #define Mul_UFIX32(a,b,n) 

♦ #define Div_SFIX32(a,b,n) 

♦ #define Mul_SFIX16(a,b,n) 

♦ #defme Mul_UFIX16(a,b,n) 

♦ #defme Mul_UFIX8(a,b,n) 



( (SFIX32)(((SFIX64)(a)*(b))»(n)) ) 
( (UFIX32)(((UFIX64)(a)*(b))»(n)) ) 
( (SFIX32)(((SFIX64)(a)«(n))/ (b)) ) 
( (SFIX16)(((SFIX32)(a)*(b))»(n)) ) 

((UFIX16)(((UFIX32)(a)*(b))>>(n))) 
(((UFIX16)(ar(b))»(n)) 



The data structure definition for the different types of data are as follows: 



Input data 



Name 


Type 


Mantissa bits 


Model space vertex 
coordinates 


SFIX32Quad 


NHCS 


Model space normal 


SFIX16Triple 


NORMAL_SFIX16 


Model space texture 
coordinates 


SFIX16 


TEXTURE_SFIX16 


Model space 
diffuse/specular color 


DWORD with 
A8R8G8B8 




Vertex/T exture 
transform matrices 


SFIX32Mat4x4 


DEFAULT_SFIX32 


Light/view vectors for 
lighting 


SFIX32Quad 


NHCS 


Fog parameters 


SFIX32 


DEFAULT_SFIX32 


Color in light/material 


UFIX8Quad 


0 


Power in material 


UFIX8 


0 



15 
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Output data 



Name 


Type 


Mantissa bits 


Transformed vertex 
coordinates (x,y,z) 


SFIX32 


ONE_SFIX32 


Transformed vertex 
coordinates (w) 


SFIX32 


DEFAULT_SFIX32 


Color 


DWORD with 
A8R8G8B8 




Texture coordinates 


SFIX16 


TEXTURE_SFIX16 


Fog 


SFIX32 


DEFAULT_SFIX32 



Intermediate data's type and mantissa bits are listed within each function. 

Details of each of the above data types is listed below. The reason why 
such data types and the mantissa bits were chosen are explained. 



20 



Lighting 



1 0 Position/Direction 



Light position or direction is taken as 



Light position SFIX32Quad, 



NHCS 



This representation provides the enough range and precision for lighting, 
15 and no extra cost exists comparing with the traditional representation such as 
non-NHCS. 



Viewpoint 

Viewpoint is represented as: 



Viewpoint 



SFIX32Quad, 



NHCS 
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This representation provides enough range and precision for lighting, and 
no extra cost exists comparing with the traditional representation such as non- 
NHCS. 



10 



Lighting color 

Lighting color includes: 

♦ Ambient. 

♦ Diffuse 

♦ Specular 
Their representation is: 



15 



20 



Lighting color 



UFIX8Quad 



No mantissa 



This presentation is a natural expansion of color in D3D in A8R8G8B8 style. 



Material property 

Material color includes: 

♦ Ambient. 

♦ Diffuse 

♦ Specular 

Each of them is represented as: 



Material color 



UFIX8Quad 



No mantissa 



This presentation is a natural expansion of color in D3D in A8R8G8B8 style. The 
25 power component is represented as: 



Power component 



UFIX8 



No mantissa 
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In one embodiment of the NHCS graphics rendering system 100, the 
power is assumed to be an integer from 0 to 127. 



Normal 



Normal is taken as: 



10 



Normal 



SFIX16 



NORMAL_SFIX16 mantissa 



From empirical evidence, it is concluded that a 16-bit normal is enough for 
rendering a Microsoft® Windows CE® device window. In a preferred 
embodiment, the NORMAL_SFIX16 is equal to 14. Moreover, the 1 sign bit must 
be preserved and 1 additional bit should be preserved as integer part for normal 
coordinates like 1 .0 or -1 .0. 



15 



Texture coordinate 
Texture coordinate is represented as: 



20 



Texture 


SFIX16 


TEXTURE_SFIX16 mantissa 


coordinate 







In a preferred embodiment, the TEXTURE_SFIX16 is equal to 12. 
Further, there is 1 bit for sign and 3 bits for an integer part. This provides 
supports for finite tiling (-8-8), and gives 4-bits sub-pixel resolution for a texture 
as large as (256x256). Note that there is a trade off between the titling size and 
sub-pixel resolution. 



25 



Output vertex coordinate 
The NHCS graphics rendering system 100 produces an output vertex 
suitable for a vertex shader. The representation is: 



SFIX32 



ONE SFIX32 mantissa 
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y 


SFIX32 


ONE_SFIX32 mantissa 


z 


SFIX32 


ONE_SFIX32 mantissa 


w 


SFIX32 


DEFAULT_SFIX32 mantissa 



When a vertex is within a view frustum, the value for x, y will be within (-1 , 1 ), and 
z in ((M ). A vertex outside the view frustum will be clipped before output. That 
is why ONE_SFIX32 is given as 30 and does not suffer from overflow. The w 
component is not normalized in (-1-1). A 16-bit fraction and a 15-bit integer is a 
good balance between the precision and range of w. 



Matrices 

Prior to rendering, several matrices should be ready. All matrices are of 
1 0 the data structure SFIX32, with DEFAULT_SFIX32 bits mantissa. 



Model space to world space 
M w : Transform matrix from model space to world space. 



1 5 Currently, a D3DM implementation assumes that the last column of this 

matrix is (0,0,0,1 ) T . No error is returned, and if a user specifies a matrix with 
different last column texture coordinate and fog it will be incorrect. 



20 



World space to view space 
M v : Transform matrix from world space to view space 



25 



Currently, a D3DM implementation assumes that the last column of this 
matrix is (0,0,0,1 ) T . No error is returned, and if user specifies a matrix with 
different last column texture coordinate and fog it will be incorrect. 



View space to clip space 
M p : Projection matrix from view space to clip space 
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Currently, a D3DM implementation assumes that the last column of this 
matrix is (0,0,1 ,0) T or (0,0,a,0) T . No error is returned. For correct fog, the last 
column should be (0,0,1, 0) T to give a correct w value. This is called the W- 
friendly projection matrix. 

5 

Model space to view space 
Mwy\ Matrix combination from model space to view space 

Mwv = M w My 

10 A D3DM implementation combines the matrices M w and M v and the last 

column of this matrix is (0,0,0,1 ) T . No error is returned. If user specifies a matrix 
with different last column texture coordinate and fog it will be incorrect. 

Model space to clip space 
15 Mwyp'. Matrix combination from model space to clip space 

Mwp = M w M v M p 

A D3DM implementation combines the matrices M Wl M v and M p . The last 
column of this matrix is determined by the parameters of these matrices. No 
20 error is returned. 

V. Operational Overview 

The NHCS graphics rendering system 100 disclosed herein uses the 
NHCS graphics rendering method to enable efficient and fast graphics rendering 

25 on a mobile computing device. FIG. 4 is a general flow diagram illustrating the 
operation of the NHCS graphics rendering method of the NHCS graphics 
rendering system 100 shown in FIG. 1. The method begins by inputting 
rendering data (box 400). In one embodiment, the rendering data is in a floating- 
point format. In another embodiment, the rendering data is in a fixed-point 

30 format. Next, the rendering data is converted into a variable-length fixed-point 
format including a normalized homogenous coordinate system (NHCS) fixed- 



32 of 63 



« 



MSFT Matter No. 304844.01 



Attorney Docket No. MCS-041-03 



point format for vector operations (box 410). The NHCS fixed-point format allows 
computations and operations to be performed on the converted rendering data 
such that a range can be predicted. Any data outside of the range is truncated. 
This processing of the data in the NHCS fixed-point format allows more efficient 
5 use of valuable memory and processing power. 

A NCHS data structure then is defined to characterize the converted 
rendering data (box 420). Next, a fixed-point math library is used to process the 
rendering data in the NHCS data structure (box 430). The math library includes 
10 mathematical operations and graphics functions. The processed rendering data 
then is ready for rendering by a rendering engine. 

VI. Operational Details 

FIG. 5 is a detailed flow diagram illustrating the operation of the 
15 conversion process of the task module 140 shown in FIGS. 1 and 3. In general, 
the task module 140 converts input rendering data into a NHCS fixed-point 
format. The input format can be a floating-point format or a fixed-point format. 
The task module 140 also includes data structure definitions and preliminary 
mathematical operations. 

20 

In general, a normalized homogenous coordinates (NHCS) is a vertices 
representation. More specifically, as shown in FIG. 5, the input to the task 
module 140 is scalar values representing a vertex in either a floating-point or a 
fixed-point format (box 500). Next, a maximum scalar value is determined from 
25 among all of the scalar values (box 510). Moreover, a maximum fixed-point 
buffer representation for a destination buffer is determined (box 520). In one 
embodiment, the maximum fixed-point buffer representation is the size of the 
destination buffer, characterized by the number of bits. 

30 Next, the maximum scalar value is scaled to the maximum fixed-point 

buffer representation (box 530). The number of digits that the value is shifted 
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then is recorded (box 540). This number of digits is known as the shift digit. 
Using the shift digit, the remainder of the scalar values are normalized (box 550). 
The output is the input values represented in a NHCS fixed-point format (box 
560). 

5 

FIG. 6 is a working example of the conversion process shown in FIG. 5. 
By way of example, assume that a vertex has 4 scalars, a,b,c and d. As shown 
in FIG. 6, c has the maximal value among these 4 scalars. The followings steps 
convert a vector to a NHCS representation. First, the maximum in the four 

10 scalars is determined. As shown in FIG. 6, the maximum scalar value is scalar c. 
It should be noted that the size of the each of the scalars can be represented 
using 64 bits (0-63). Next, the maximum size of the destination fixed-point buffer 
representation is determined. In FIG. 6, this size is represented by the window 
600 (shown outlined as a thicker line). The maximum size of the destination 

1 5 fixed-point buffer is 32 bits (0-31 ). Thus, the size of the window is 32 bits. 

Next, scaling is performed such that the maximum scalar value (scalar c) 
is scaled to the maximum size of the destination fixed-point buffer, in this case 32 
bits. The shift digit r, or the number of digits needed to shift scalar c, is recorded. 
20 Finally, the shift digit r is used to normalize the rest of the scalars (a, b, d) based 
on the maximum scalar c. This converts input data in a floating-point or fixed- 
point format into a NHCS fixed-point format. 

NHCS preserves the full resolution of the maximal resolution and in vector 
25 computation. With NHCS, the intermediate result is stored as L*2-bits for the 
original L-bits, which assures no precision loss in multiplication. The 
intermediate result is then truncated to L-bits to preserve maximum precision (in 
FIG. 6, L=32). 

30 FIG. 7 is a detailed flow diagram illustrating the operation of the API 

module 150 shown in FIGS. 1 and 3. In general, the API module 150 generates 
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buffers for the data computed and sent by the task module 140. In addition, the 
API module 1 50 prepares the data for the driver module 160. The API module 
process begins by inputting rendering data in an NHCS data structure (box 700). 
Next, specialized buffers are created or generated for storing the converted 
rendering data (box 710). This includes generating an index buffer, a vertex 
buffer and a command buffer. Finally, the rendering data is prepared for the 
driver module (box 720). Preparing the rendering data includes specifying 2-D 
and 3-D primitives and specifying how those primitives are to be drawn. 

FIG. 8 is a detailed flow diagram illustrating the operation of the driver 
module 160 shown in FIGS. 1 and 3. In general, the driver module prepare the 
rendering data for the rendering engine and raster. More specifically, the driver 
module 160 inputs the stored data from the API module (box 800). Next, the 
mathematical library is used to convert the 3-D input data into 2-D screen 
coordinates (box 810). The mathematical library also is used to prepare input 
data for the rendering engine and for raster (box 820). Finally, the driver 
module 160 outputs 2-D data in screen coordinates for rendering on a monitor 
(box 830). 

Mathematical Library 

The mathematical library includes mathematical operations and graphics 
functions. The mathematical library now will be discussed in detail. 

Feature division 

The features of the mathematical library are divided into features that are 
supported by the rasterizer, resource management, and features supported by 
transform and lighting (T&L). The mathematical library implements all features 
supported by T&L. 
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Features Supported in the Rasterizer 

The following features are features in the mathematical library that are 
supported by the rasterizer: 

♦ Point, line list, line strip, tri list, tri strip and tri fan rendering 
5 ♦ Point, wireframe, solid fill 

♦ Flat and Gouraud shading 

♦ Depth test with various compare mode and pixel rejection 

♦ Stencil compare and pixel rejection 

♦ Depth buffer-less rendering is supported as well 
10 ♦ W buffer support 

♦ MipMap textures are supported (Interpolate) 

♦ 8 stage multi-texture with D3D8 fixed function blending options 

♦ Point, linear, anisotropic, cubic and Gaussian cubic texture filtering 

♦ Alpha blending (with several blend modes) 
15 ♦ Palletized textures 

♦ Perspective correct texturing (not on by default) 

♦ Color channel masking (COLORWRITEENABLE) 

♦ Dithering 

♦ Multisampling for FSAA 
20 ♦ Texture address modes 



Features Supported in Resource Management 

Resources are objects that are resident in memory, such as textures, 
vertex buffers, index buffers and render surfaces. Resource management is the 
25 management of the various memory operations on these objects. These 
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operations include allocation, copying, moving, locking for exclusive usage, 
unlock and de-allocation. The following features are features in the mathematical 
library that are supported in resource management: 

♦ Swap chain creation and management for display 

♦ Depth/stencil buffer creation and management 

♦ Vertex buffer creation and management 

♦ Index buffer creation and management 

♦ Texture map creation and management 

♦ Many texture formats including DXT compressed texture 

♦ Scratch surface creation/management for texture upload 

♦ MipMap textures are supported (Build) 

♦ Dirty rectangular texture update mechanism 

♦ All buffers lockable (assuming driver support!) 



Features Supported in T&L 

The following features are features in the mathematical library that are 
supported by in T&L: 

♦ Texture coordinate generation 

♦ View, projection and world transform matrices 

♦ Single transform matrix per texture coordinate set (8 sets max) 

♦ Up to 4 dimensions per texture coordinate set 

♦ Ambient/diffuse/specular lighting and materials 

♦ Directional and point lights 

♦ Back face culling 

♦ Fog (depth and table based) 
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Math functions indexed by features 

In this section, the mathematical functions indexed by features are 
described. The functions cover transform, culling, lighting, culling, texture and 
other miscellaneous functions. In addition, the overflow and underflow 
(resolution loss) problems of these functions are discussed. 

Transform functions 



NHCS vector transform 


int TransQuad_SFIX32(SFIX32Quad b, SFIX32Mat4x4 m, SFIX32Quad c) 


This function transforms a 32-bits NHCS vector b to another 32-bits 
NHCS vector c by matrix m. 


Parameters 


Input vector in SFIX32Quad in NHCS format 
m 

Transform matrix in SFIX32Mat4x4 and DEFAULT_SFIX32 
format. 






c, 

Output vector after transform in SFIX32 format in NHCS 
representation. 


Return 
value 


An integer indicates the shift bits in converting intermediate 64- 
bitscto 32-bits NHCS c. 


Remarks 


♦ Overflow: 

The maximum possible intermediate value is: 4*(0x8000 
0000*0x8000 0000) = Ox 1 0000 0000 0000 0000. This indicates 
that a 64-bits intermediate value will have overflow in the 
intermediate data before NHCS. 

♦ Underflow: 

Appears when truncated from intermediate buffer. 
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Matrix combination 



void MatMul4x4_SFIX32(SFIX32Mat4x4 ml, SFIX32Mat4x4 m2, 
SFIX32Mat4x4 m3, UFIX8 n) 


This function combines two 32-bits 4x4 matrices to another 32-bits 4x4 
matrix 


Parameters 


ml, ml 

Input matrices in SFIX32Mat4x4 
n 

Input shift bits for shifting the 64-bits multiplication results to 

32-bits results. 

m3, 

Output combined matrix. 


Return value 


No return value 


Remarks 


♦ Shift 

The matrices ml , ml, m3 can have different mantissa bits. 
Suppose ml with a bits mantissa and ml with b bits 
mantissa, to aet a c-bits mantissa m3 we should <»pt n = 
(a+b)-c 

♦ Overflow: 

The maximum possible intermediate value is: 4*(0x8000 
0000*0x8000 0000) = Ox 1 0000 0000 0000 0000. This 
indicates that a 64-bits intermediate value will have overflow 
in the intermediate data. When truncating the 64-bits 
intermediate result to 32-bits output, overflow is also possible. 

♦ Underflow: 

Appears when truncated from intermediate buffer. 



Non-NHCS vector transform 
void TransQuad_SFIX16(SFIX16Quad b, SFIX32Mat4x4 m, SFIX16Quad 
c) 
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This function transforms a 16-bits vector to a 16-bits vector. 


Parameters 


b, 

Input vector in SFIX16Quad with TEXTURE_SFIX16 bits 

mantissa. 

m 

Transform matrix in SFIX32Mat4x4 and DFFAUl T ^Fiy^9 

format. 

c 

Output vector after transform in SFIX1 6 format with 
TEXTURE_SFIX16 bits mantissa. 


Return Value 


No return value. 


Remarks 


♦ Overflow: 

Appears when go out range of TEXTURE_SFIX16 mantissa. 

♦ Underflow: 

Appears when go out range of TEXTURE_SFIX16 mantissa. 




void TransNorm_SFIX16(SFIX16Triple b,SFIX32Mat4x4 m, SFIX16Triple 

c) 


This function transforms a 16-bit normal to a 16-bits normal. 


Parameters 


b 

Input vector in SFIX16Triple with NORAML_SFIX16 bits 

mantissa. 

m 

Transform matrix in SFIX32Mat4x4 and DEFAULT_SFIX32 

format. 

c 

Output vector after transform, it is in SFIX16 format with 
NORMAL_SFIX16 bits mantissa, normalized. 


Return value 


No return value. 
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Remarks 


♦ Matrix 




For transform normal, only the upper 3x3 part of m is used. 




♦ Normalization: 




The output is normalized by Normalize_SFIX16Triple() 



NHCS to non-NHCS convert 



Void DivWW_SFIX32(SFIX32 w, int shift, SFIX32Quad c, SFIX32Quad cc) 


This function transforms a NHCS vertex to clip space non-NHCS vertex. 


Parameters 


w 

Input w to be divided from the NHCS vertex, SFIX32. It is the 

b[3] in TransQuad_SFIX32(). 

shift 

Input shifted bits return from TransQuad_SFIX32(). For 
calculating the correct w 




c 

Input vertex after TransQuad_SFIX32(), NHCS 
cc 

Output vertex with non-NHCS SFIX32 format. cc[0]~cc[2] has 
ONE_SFIX32 bits mantissa, and cc[3] has DEFAULT_SFIX32 
bits mantissa. 




Return value 


No Return value 


Remarks 


♦ This function is related to TransQuad_SFIX32(). 

♦ With this function we get the actual clip space vertex from 
NHCS clip space vertex for finally converting to float point 
vertex and output to vertex shader. 



5 

Void DivW_SFIX32(SFIX32 w, int shift, SFIX32Quad c, SFIX32Quad cc) 
This function transforms a NHCS vertex to clip space non-NHCS vertex. 
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Parameters 


w 

Input w to be divided from the NHCS vertex, SFIX32. It is the 

b[3] in TransQuad_SFIX32(). 

shift 

Input shifted bits return from TransQuad_SFIX32(). For 

calculating the correct w 

c 

Input vertex after TransQuad_SFIX32(), NHCS 
cc 

Output vertex with DEFAULT_SFIX32 format. 


Return value 


No Return value 


Remarks 


♦ This function is related to TransQuad_SFIX32(). 

♦ This function is used in texture coordinate generation 
from view space position, so the precision and range is 
different from DivWW_SFIX32 above. 


Cullinq functions 

Backface testinq 


BOOL Backface_SFIX32(SFIX32* a, SFIX32* b, SFIX32* c, BOOL bCCW) 


This function checks if the triangle (a, b, c) is a back face. 


Parameters 


a, b, c 

3 sequential vertex of an triangle, they are in SFIX32Quad 

with NHCS representation 

bCCW 

Face orientation, TRUE for CCW, FALSE for CW 


Return value 


BOOL, TRUE for back face, FALSE for non-back face. 


Remarks 


♦ There is a sequential multiplication of 3 operands. 

♦ NHCS is used to compress the operand from 32-bits to 
16-bits since we only need the sign. 
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View frustum culling 
View frustum culling removes the triangles whose vertices are outside of 
one view frustum plane. View frustum involves 6 planes: 
5 ♦ Left plane. 

♦ Right plane. 

♦ Top plane. 

♦ Bottom plane. 

♦ Near plane 
10 ♦ Far plane. 

A UFIX8 is set to hold 6 flags for culling. FIG. 9 illustrates an exemplary 
implementation of a buffer to store culling planes. In particular, FIG. 9 shows an 
UFIX8 format buffer to store the culling planes. View frustum culling is performed 
15 in clip space. If it is assumed that b is a NHCS coordinate in the clip space, the 
algorithm is: 

SFIX32Quad b; // NHCS clip space coordinates 
UFIX8 f=0; 
20 if (b[0]<-b[3]) 

f |= 0x01 ; 
else if (b[0]> b[3]) 
f |= 0x02; 
if(b[1]<-b[3]) 
25 f |= 0x04; 

else if (b[1]> b[3]) 
f |= 0x08; 
if (b[2]<0) 
f |=0x10; 

30 else if (b[2]> b[3]) 

f|= 0x20; 
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If three flags for each vertex are obtained, an "AND" operation can be 
used to test whether the flags are outside of the same plane. 

5 The flag is also useful in the vertex cache, and the 2 unused bits will 

indicate: 

♦ Transformed status (indicates whether a vertex has been transformed) 

♦ Lit status (indicates whether a vertex has been lit). 

10 Lighting functions 

The direct3D for mobile supports both directional light and point light. The 
lighting model used is the Phong model for vertices. Lighting is done in model 
space. A material should be assigned to the object, and the ambient, diffuse, 
specular, power property is denoted as MAmbient, Moitfuse, Ms pe cuiar and Mp 0W er 

15 respectively. In D3D, MAmbient, M D iff U se, M Spe cuiar are defined as (r, g, b, a), and 
each component is a float within [CM]. 



Each component only need be represented as: 



Lighting component 



UFIX8 



8 bits mantissa 



20 

The color of lighting is noted as LAmbient? Loiffuse and Lspecuiar. Given 
normalized vectors N, L and V, which represent vertex normal, vertex-light 
direction and vertex-view direction respectively, the color of a vertex can be 
calculated as: 

25 C = L Ambient M Ambient + L Di jj use M Di jj use {N •L)-{- L Specular M Specular (N • H) 



FIGS. 10A and B illustrate an exemplary implementation of normalized 
vectors in a D3DM Phong Model. As shown in FIG. 10A, L is the vector from 
vertex to light, and N is the vertex normal. R is the reflection direction of light, 
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which is symmetric to L by N. As shown in FIG. 10B, Vis the vector from vertex 
to view point, and H is the half vector of L+V. 

All the vectors are transformed to the same space for "dot product" 
5 computation, and are normalized for calculation. In this implementation, the 
model space for saving the transformation of each vertex normal to view space 
was chosen. However, this choice also brings problems if the model transform 
contains shears and scaling. Although lighting in model space is discussed here, 
it is easy to extend the discussion to other spaces. Both lit in model space and lit 
10 in view space are supported in the rendering pipeline of the NHCS graphics 
rendering system. 



Invert Length of a Normal 



SFIX32 TriplelnvLen(SFIX16Triple a) 


This function gives the invert length of a SFIX16Triple, which is useful in 
normalize 


Parameters 


a 

Un-normalized input in SFIX16 in NHCS 


Return value 


Invert length in SFIX32 


Remarks 


♦ Assume a is a n-bits mantissa, the result is of 42-n bits 
mantissa. It does not matter if 42-n>32, because the 
calculation does not use n explicitly. 

♦ Newton's iteration method is used here for solving the 
invert square root, using a 256-item lookup table. 



15 

NHCS Vector Normalization 
Void Normaliz _SFIX16Triple(SFIX16Triple a, SFIX16Triple b) 
This function normalizes a NHCS SFIX16Triple. 
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Parameters 


a 

Un-normalized input in SFIX16 in NHCS 
b 

Normalized output in SFIX16 format with NORMAL_SFIX16 
mantissa 


Return value 


No return value 


Remarks 


♦ We use SFIX32 to hold the intermediate TriplelnvLen () 
result to prevent overflow and keep precision. 


Negative Normalization of NHCS Vector 


Void NagNormali2e„SFIX16Triple(SFIX16Triple a, SFIX16Triple b) 


This function gives a negative result to Normalize_SFIX16Triple 


Parameters 


A 

Un-normalized input in SFIX16 in NHCS 
b 

Normalized output in SFIX16 format with NORMAL_SFIX16 
mantissa 


Return value 


No return value 


Remarks 


♦ We use SFIX32 to hold the intermediate TriplelnvLen () 
result to prevent overflow. 

♦ It is used in normalization of directional light. Gives a 
normal L from vertex to lighting source. 


Subtraction of Two NHCS Vectors 


Void SubNorm_SFIX32Quad (SFIX32Quad a, SFIX32Quad b, 
SFIX16Triple c) 


This function calculates normal from subtraction of two NHCS vectors. 


Parameters 


a, b 

Input vectors in SFIX32 with NHCS format 
c 
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Normalized (a-b) in SFIX16 with NORMAL_SFIX16 bits 
mantissa 


Return value 


No return value 


Remarks 


It is used in normalization of view direction Vand light 
direction L when using point light. 


Dot Production of Two Normalized Vectors 


UFIX16 Dot_SFIX16Triple(SFIX16Triple a, SFIX16Triple b) 


This function returns the dot product of two normalized vector 


Parameters 


a, b 

Normalized input in SFIX16 with DEFAULT_SFIX16 bits 
mantissa. 


Return value 


Dot product with ONEJJFIX16 bits mantissa 


Remarks 


♦ If the two vectors are normalized, there will no overflow at 
all because the result will be within (CM). 

♦ Value that less than 0 is clamped to 0. 


Power 


UFIX16 PowerJJFIX16 (UFIX16 a, UFIX8 n) 


This function returns the power(a,n) 


Parameters 


a 

Power base with ONEJJFIX16 bits mantissa. 
n 

Power exponential within 0-127 




Return value 


Power value with UFIX16 format 


Remarks 


♦ We use the efficient digit of n to determine how much 
multiply we need. 

♦ In rendering pipeline the n can be fixed. We use static 
variables to store the n and its efficient digit. If n is the 
same in the consequential calling, the efficient digit will be 
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same as previous one instead of calculated again. 

Half Vector 

The half vector is used to approximate the actual cos# = (V *R) by 
cost// = (N*H) for calculating the specular component. H can be calculated by 
5 the normalized L and V: 

L + V[ 

L and Vare represented by SFIX16Triple with NORMAL_SFIX16 bits 
mantissa. To avoid overflow and keep precision, they are first added together as 
10 a SFIX32Triple. Next, the half vector H is made in NHCS SFIX16Triple, and H 
then is normalized. 

Texture Coordinate Generation 

Texture coordinate generation uses view space normal/position/reflection 
15 to generate the texture coordinates in each vertex. View space normal and 
position is available after lighting in view space. However, reflection vectors 
need to be calculated here. 



Reflection Vector from Normal and View 



Void CalcR_SFIX16Triple(SFIX16Triple norm, SFIX16Triple view, 


SFIX16Triple reflect) 


This function calculates reflection vector from normal and view 


Parameters 


Norm 




normalized normal in SFIX16, NORMAL_SFIX16 




view 




normalized view direction in SFIX16, NORMAL_SFIX16 




reflect 




Normalized output in SFIX1 6 format with NORMAL_SFIX16 
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mantissa 


Return value 


No return value 


Remarks 


R = 2(NV)N-V 



NHCS Clip Space Coordinates Clipping Algorithm 

The model-view transform and view-projective transform can be combined 
p 

5 into a 4x4 matrix 4x4 : 



( 



x y z 



r 



w 
\ p 



w. 



w. 



J 



(1) 



— — = X 



The term w " is defined, and is similar to y, z. In fact, the term is the 
1 0 normalized screen space coordinates. This assumes the correct wp is obtained 



-Lp " 

^4x4 

for each vertex. Multiplying (1) by ( w » ), yields: 



x y z 1 



yWp Wp Wp W pJ 



= (x w y w z w i)^" 1 

(2) 



Equation (2) is a linear equation, which indicates that 1/wp can be linearly 
interpolated. Given three vertices and three texture coordinates: (*' y > z > ^ 

1 5 and v < X > (i=1 ,2,3) for a triangle, there exists an affine transform which 
maps texture coordinates to object space, if the triangle is not degenerated: 



(u v l)A M ={x y z l) 



(3) 



20 Combining (3) and (1 ), both sides are divided by the wp, and thus: 
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f 1 ^ 

U V 1 



w n w n w n 

V P P P J 



B = {* w y w * w i) 



(4) 



Where B " ^ 4 ^ 4x4 



5 Equation (4) indicates u/wp, v/wp can be interpolated linearly. For 

perspective-correct texture mapping, after linearly interpolating u/wp, v/wp and 
1/wp, the correct texture coordinates can be computed for projective-correct 
texture mapping. 

10 The algorithm for interpolating between two points is: 

Input: point ^ y}p Zxp w ^ ^ Xlp y2p Zlp W2p \ 

r\r i ax+bv+cz+d = 0 

Clip plane w - /w w 

The intersection point ^ Xp yp Zp Wp ^ will satisfy: 

' * w = *p I ™ P = *\pl W\ P + ( X 2 P / W 2 P - X Xp I ™ Xp + (X 2w ~X ]w )t 

y w =yp /w p=yip /w \ P + (y 2p 1 ™ 2P - y\ P I *x P )' = + 0^ - yiJ* 

Z W = Z p / Wp= Z \p 1 ™\p + ( Z 2p 1 W 2p " Z \ P I W\ P )t = Z lw + (*2w - 

\/w p =Vw Xp +(l/w 2p -\/w lp )t 
Take into clip plane, yields: 

°*\w + fyi* + cz iw + rf + ( a (x 2w " x lw ) + &Cy 2 „ - y^) + c ( z 2w " *i J)' = 0 

Then: 



15 



20 
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\/w p =\/w ip +(l/w 2p -\/w ip )t 

1 -(W| P -\K + ^i P +cz, p + <ftv, p ) 

Wi,0»2 P +^2 P +cz 2p )-w 2p (flx, f + 6y lp + cz, p )-(iv lp - w 2p )(ax, p +by ip + cz lp + dw lp ) 

w,„ (w Ip (ax 2p + 6y 2p + cz 2p ) - w 2p (ax lp + by lp + cz lp )) 

_ a(* 2p ~x Xp ) + Ky 2p -y ]p ) + c(z 2p -z lp ) + d(w 2p -vy) 

Wi„0»2„ +^2 P +cz 2 p)-w 2p (ax lp +6y lp +cz lp ) 

And, 

_ gCflpj^ - ^2p^p) + C( X Xp Z 2 P ~ f2pflp) + <*(*lp>"2p ~ *2p"lp) 

* w w lp (ax 2p + by 2p + cz 2p ) - w 2p (ax lp + by lp + cz Xp ) 

_ "0y*2p -^.pHcOy^p -yip z \ P )+ d( <y\p w 2p-y2P w xp) 

y " ~ w lp (ax 2p +by 2p +cz 2p )-w 2p (ax ip +by lp +cz lp ) 

= a{z Xp x 2p ~z 2p x lp ) + b(z lp y 2p -z 2p y lp ) + d{z Xp w 2p -z 2p w lp ) 
w lp (ax 2p +by 2p +cz 2p )-w 2p (ax lp +by ip +cz lp ) 

After NHCS transform, gives: 

( X «p > y n p . Z np . W np ) = 'VvC, W "m ' ' *P ' W /> ) 

which gives: 

( JC lp^lp» Z lp' VV lp) = ( X \np>y\np> Z lnp> W \np) 

C l W \nm 

( X 2p>y2p> Z 2p> W 2p) = ( X 2np>y2np> Z 2np> W 2np) 

C 2 W 2nm 



Thus, the final representation of ^ y ^ z ^ x w p> becomes: 



c, w lflm (ox 2 + by 2 +cz 2np + dw 2np ) - c 2 w 2nm (ax Up + by lnp + cz lnp + dw lnp ) 

l/w, = ; : : 7 : 

"W (^2»p + ^ 2np + cz 2np ) - w 2np (ax,„ p + by tnp + cz lnp ) 

^InpJ^p - X 2npyinp) + C ( X lnp Z 2«p ~ X 2np Z \np) + d ( X lnp W 2 n p ~ X 2np W Xnp) 



™ln P («*2np + ^np + CZ 2np ) " W 2np (^Xnp + ^Xnp + CZ Xnp ) 
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= a (y\np*2np -y2np X inp) + C(y inp Z 2np - y lnp Z Xnp ) + d(y Xnp W 2np - ^ 2 „ p W, np ) 

(«*2,p + V + CZ 2np ) " W 2„p + 6 ->V + CZ l„p ) 

W l V ^inp + V + CZ 2„p ) " W 2„p + by\«p + CZ lnp ) 

And the representation of (-V-W^) 

"W (^2„p + *V 2V + CZ 2„p ) - W 2„p iflXxnp + ty lnp + CZ inp ) 
W — — — 

Wnmi^np + ^np + ^2np + dW 2np) ~ ^^nmi^np + tylnp + + dw iap ) 

b ( X U p y2np ~ X 2n p yx np ) + c(*l„p Z 2,,p - X 2np Z Up ) + d(x inp W 2np - X 2np W Xnp ) 

" c \ w u m (ax 2 np + b y 2np + cz 2np + dw 2np ) - c 2 w 2nm (ax lnp + by Xnp + cz Up + dw lnp ) 

a (yinp X 2np ~ y 2np *Up ) + C 0V Z 2n P ~ ^Vlnp) + j(jV%, ~ 7 2n p^l np ) 

" c, w Xnm {ax 2np + by 2np + cz 2np + dw 2np ) - c 2 w 2nm (ax inp + by Xnp + cz Up + <zV 1(1/ , ) 

a ( Z 1„p*2,,p ~ Z 2 n p*.,,p) + b{Zx np y2np ~ Z 2npJV ) + ^( Z l„p^2 n p ~ Z 2np W lnp) 

" w lnm (ox 2v + by 2np + cz 2np + dw 2np ) - c 2 w 2nm (ax lnp + by Xnp + cz Xnp + dw lnp ) 

In case the new intersection point will participate in further clipping, it can 
be written in NHCS form: 

X np = b ( X \n p y 2 np ~ X 2np y lnp ) + c(x lnp Z 2np - X 2np Z Up ) + d(x Up W 2np - X 2np W Up ) 
y np = «0V*2* - yUp*U,) + C (y\np Z 2np ~ y 2 np Z lnp) + d(y Xnp W 2np " y 2np W lnp ) 

z np = a(z inp x 2np - z 2np x Xnp ) + b(z Xnp y 2np - z 2np y Xnp ) + d(z lnp w 2np - z 2np w Xnp ) 
w n P = w \np + b y 2np + cz 2np ) - w 2np (ax lnp + by Xnp +cz Xnp ) 

And 

Cw = c x w lnm (ax 2np +by 2np +cz 2np +dw 2np )-c 2 w 2nm (ax lnp +by inp + cz Xnp + dw Xnp ) 

Here, C is the shifted bits and w is the weight, and the interpolate 
parameter is: 
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Miscellaneous Functions 
5 There are some functions that have not been discussed in the previous 

sections. These functions include: (1) NHCS functions that perform NHCS 
conversion; and (2) EffiDigit functions that calculates efficient digit of an integer. 
These functions will now be discussed. 

10 Calculate Efficient Digits in UFIX8 



UFIX8 EffiDigit_UFIX8(UFIX8 a) 


This function calculates efficient digits in an UFIX8 integer 


Parameters 


a 

Input integer, unsigned 8-bits integer in UFIX8 format 


Return value 


Efficient digit of the integer, which equals ceil(log2(abs(a)) in 
UFIX8 format. 


Remarks 


Using Bisearch algorithm 



MS FT Matter No. 304844.01 

rp P \p 



Calculate Efficient Digits in SFIX32 



UFIX8 EffiDigit_SFIX32(SFIX32 a) 


This function calculates efficient digits in an SFIX32 integer 


Parameters 


a 

Input integer, signed 32-bits integer in SFIX32 format 


Return value 


Efficient digit of the integer, which equals ceil(log 2 (abs(a)) in 
UFIX8 format. 


Remarks 


Using Bisearch algorithm 



53 of 63 



MSFT Matter No. 304644.01 



Attorney Docket No. MCS-041-03 



Calculate Efficient Digits in SFIX64 
UFIX8 EffiDigit_SFIX64(SFIX64 a) 



This function calculates efficient digits in an SFIX64 integer 



Parameters 


a 

Input integer, signed 64-bits integer in SFIX64 format 


Return value 


Efficient digit of the integer, which equals ceil(log 2 (abs(a)) in 
UFIX8 format. 


Remarks 


Using Bisearch algorithm 


Conversion from SFIX64Quad to SFIX32Quad NHCS 


int NHCS_SFIX64Quad (SFIX64Quad a, SFIX32Quad b) 


This functions convert from non-NHCS to NHCS 


Parameters 


a 

Input integers, signed 64-bits Quad, in SFIX64Quad format. 
b 

Output integers, signed 32-bits Quad, in SFIX32Quad, NHCS 
format. 


Return value 


An integer records shift bits from 64-bit non-NHCS to 32-bit 
NHCS. 


Remarks 


♦ NHCS_SFIX64Quad is used in transform. In transform, 
we need not shift when efficient digits of maximum 
component are less than storage bits. 

♦ In clip space has either NHCS or non-NHCS, For 
recovering the correct w, it needs to record the shift bits. 
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Conversion from SFIX64Triple to SFIX16Triple NHCS 
Void NHCS_SFIX64Triple(SFIX64Triple a, SFIX16Triple b) 



This functions perform NHCS conversion 



Parameters 


a 

Input integers, signed 64-bits Triple, non-NHCS 
b 

output integers, signed 16-bits Triple, NHCS 


Return value 


No return value 


Remarks 


♦ NHCS_SFIX64Triple is used in lighting before 
normalization. Either efficient digit of maximum 
component is less than storage bits or not, we need shift 
to preserve precision. 


Conversion from SFIX64TriDle to SFIX16TriDle NHCS 


Void NHCS_SFIX64Triple(SFIX32Triple a, SFIX16Triple b) 


This functions perform NHCS conversion 


Parameters 


a 

Input integers, signed 32-bits Triple, non-NHCS 
b 

output integers, signed 16-bits Triple, NHCS 


Return value 


No return value 


Remarks 


♦ NHCS_SFIX32Triple is used in lighting before 
normalization. Either efficient digit of maximum 
component is less than storage bits or not, we need shift 
to preserve precision. 



» 
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The foregoing description of the invention has been presented for the 
purposes of illustration and description. It is not intended to be exhaustive or to 
limit the invention to the precise form disclosed. Many modifications and 
variations are possible in light of the above teaching. It is intended that the 
5 scope of the invention be limited not by this detailed description of the invention, 
but rather by the claims appended hereto. 
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